Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update: nvm representation #68

Open
wants to merge 138 commits into
base: main
Choose a base branch
from
Open

update: nvm representation #68

wants to merge 138 commits into from

Conversation

fia0
Copy link
Contributor

@fia0 fia0 commented Feb 24, 2025

I've worked a bit on the previous draft (#63) of the nvm tree variant of haura's nodes to improve performance. This is the current version of this endeavour. Performance is improved on the new variant of the tree as well as the old "block" variant due to various performance fixes in the entirety of the storage engine. Additionally, StorageKinds are introduced to map tiers to storage media types. Oh, and we allow for yaml configs because i don't want to foo: {{{{{[[{ option: null }]]}}}}} anymore.

Also some crucial bugs are fixed related to the node balancing in haura. This PR arguably fixes some problems which have existed some time related to the epsilon of our b^epsilon.

Compression still needs to be integrated with this variant of the tree as well as some safety checks.

fia0 and others added 30 commits January 29, 2025 12:24
They are meant to allow for nodes to do their own integrity check like internal
checksumming on singular entries. Analagous this can be done for compression.
So for quite some time sequential insertion constructed a tree which did not
really adhere to the bepsilon-tree rules. This was due to the nodes-in-cache
optimization in the insertion code which skips insertion into nodes when their
child nodes are in cache. This lead to the case that on sequence many leaves
where created and all the pivots are inserted into the parent node of the last
node in cache, this was never checked bc we only call rebalance on the final
node which was the last node in cache. Now bc of this these parents grew
without checks and pivots were essentially just glued together. First, this
slows down searching in the node. Second, all access guarantees and buffer
spaces normally allowed in the bepsilon tree are gone and with only pivots our
tree essentially behaved like a btree in these scenarios. Why this was never
caught before i don't know but this commit fixes this behavior doing two
things: 1. The `is_too_large` of the node objects now include this space
devision of at maximum B^epsilon space for pivots. Meaning as soon as nodes
overstep this boundary they are split to adhere to bepsilon-tree construction
but might be smaller than 4m, 1m, whatever. This has implication on performance
(positive and negative) but is the correct thing to do. 2. Before we check if
the child of the current node is in cache and can be modified we check if the
current node is already too large if this is the case we DO NOT SKIP THE
CURRENT NODE but instead insert the message into the current internal node.
This causes more operations on insertion but also makes future updates as cheap
as they are actually expected to be with the complexity of the bepsi tree.

In the context of this: Another bug was fixed which highlights how problematic
this behavior was, the `get_with_info` code of the node was not able to fetch
an entry when it was not present in the leaves. Due to the bug when
constructing the tree sequentially this was not caught somehow before. It is
fixed now.
used the absolute storage size instead of cache size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants